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CJ , Abstract. We prove an ^^(logTi) bound for the expected value of the log- 

I i ' arithm of the componentwise (and, a fortiori, the mixed) condition number 

of a random sparse n x n matrix. As a consequence, small bounds on the 
►^ , average loss of accuracy for triangular linear systems follow. 

^: 

vn . 

0\ ■ 1 Introduction 

p; 

t~^ ■ Triangular systems of linear equations provide one of the few examples in 

^^ , numerical linear algebra where a gap occurs between stability analysis and 

f— s I everyday practice. One could summarize this gap as follows: 

^ ^ ■ Triangular systems of equations are generally solved to high ac- 

S>^ . curacy in spite of being, in general, ill-conditioned. 

H : 

5t , This state of affairs had been already noted by J.H. Wilkinson in [9, p. 105]: 

"In practice one almost invariably finds that if L is ill-conditioned, so that 
||L||||L~^|| » 1, then the computed solution of Lx = b (or the computed 
inverse) is far more accurate than [what forward stability analysis] would 
suggest." 

An explanation to this gap is suggested by N. Higham [4] who notes 
that the backward error analysis given by Wilkinson for the solution of tri- 
angular systems yields (small) componentwise bounds on the perturbation 
matrix (see Section 6(1) below). Higham then uses this fact to deduce small 



'Partially supported by GRF grant CityU 100808 



forward error bounds for particular subclasses of triangular systems and to 
numerically investigate the accuracy of other particular such systems. In 
doing so, Higham makes use of the mixed condition number introduced by 
Skeel [5]^. This condition number has a natural role in analyzing accuracy 
of triangular systems since bounds for it, together with the backward anal- 
ysis of Wilkinson mentioned above, yield forward analysis bounds for the 
computed solution of the system. Furthermore, the restriction to compo- 
nentwise perturbations — both in the backward error analysis and in the 
mixed condition number — forces perturbations to preserve the triangular 
structure of the data matrices. 

A further step in explaining the gap, somehow orthogonal to the work 
of Higham, was given by D. Viswanath and N. Trefethen in [8] where a 
precise meaning to the expression "triangular systems are, in general, ill- 
conditioned" was given. Indeed, if L„ denotes a random triangular n x n 
matrix (whose entries are independent standard Gaussian random variables) 
and Kn = ||Ln||||-^n^ll is its condition number (which is a positive random 
variable) then, it is shown in [8], 

^/k^ -^ 2 almost surely 

as n ^ cx). A straightforward consequence of this result is that the expected 
value of log Kn satisfies E(logK„) = il{n). 

The goal of this paper is to close the gap by giving a precise meaning, 
in the sense of [8], to the other half of the statement above namely, to the 
expression "triangular systems are generally solved to high accuracy." More 
precisely, we consider the mixed condition numbers m^{Ln) and m[Ln, bn) 
for the problems of matrix inversion and linear equation solving, respectively, 
for a random triangular L„ as above and a random 6„ G IR". Then, we show 
that 

E(logmt(L„)), E(logm(L„,6„)) = O(logn). (1) 

From the bound on E(log m,(L„, bn)) it follows that the average loss of pre- 
cision in the solution of random triangular systems is small. From that on 
E(logm,^(L„)), that the one for matrix inversion is small as well. One can 
therefore replace the summary above by the following: 

Triangular systems of equations are generally solved to high ac- 
curacy because their backward error analysis yields small compo- 



^Skeel called it "componentwise." In this paper, however, following the notation in- 
troduced in [3], we will use this word for the condition numbers measuring both data 
perturbation and computed errors in a componentwise fashion. 



nentwise perturbations and triangular matrices are, in general, 
well conditioned for these perturbations. 

The results showing (1), Theorems 2 and 3 below, are proved in the more 
general context of sparse matrices (i.e. matrices with a fixed pattern of zeros) 
and componentwise condition numbers (which ensure high relative accuracy 
in each component of the computed solution A~^ or x). Besides triangular 
matrices, these results apply to other classes of sparse matrices such as, for 
instance, tridiagonal matrices. In the process of proving them, we found 
useful to estimate as well the average mixed condition for the computation 
of the determinant. 



2 Preliminaries 

Condition numbers measure the worst-case magnification of a small data 
perturbation in the computed outcome. As originally introduced by Tur- 
ing [7], they were norniwise in the sense that both the data perturbation 
and the outcome's error are measured using norms (in the space of data 
and outcomes respectively). In contrast, mixed condition numbers measure 
data perturbation componentwise, and componentwise condition numbers 
measure both data perturbation and outcome's error in this way. 

To define these condition numbers the following form of "distance" func- 
tion will be useful. For points u,v £ MP we define ^ = {wi, . . . , Wp) with 



Wi 



Ui/vi if f j / 
if Uj = f i = 

oo otherwise. 



Then we define 



d{u,v) 



u — V 



Note that, if d{u,v) < oo, 

d{u,v) = min{z^ > | juj — Vi\ < z^|fj| for i = 1, . . . ,p}. 

For (5 > and a G RP we denote B{a, (5) = {x G M^ | d{x, a) <5}. 

Definition 1 Let D C TRP and F :T) ^ H'^ be a continuous mapping. Let 
aeV such that F{a) / 0. 



(i) The mixed condition number of F at a (with respect to a norm || ||q on 
M'^) is defined by 

m{F,a) = hm sup " ^ ^ ^ ^"^ 



s^o ^eB{a,5) \\F{a)\\q d{x,a)' 

(ii) Suppose F{a) = (/i(a), . . . , /g(a)) is such that fj{a) / for j = 
1, . . . , g. Then the componentwise condition number of F at a is 

/^ N , d{F(x),F{a)) 

c(F,a) = hm sup ^ ;/' ,^ '' . 

<5^0 a:eB(a,5) d[X , O) 

Proposition 1 For all a ^ T) and any monotonia norm in IR'^, Tn{F,a) < 
c{F,a). 

Proof. For all x G B{a,6) andalH < q, \F{x)i-F{a)i\ < d{F{x),F{a))\F{a) 
Since || || is monotonic (cf. [1]), this implies ||F(x)— F(a)|| < (i(F(x),F(a))||F(a)| 
and hence the statement. □ 

In all what follows, for n G IN, we denote the set {1, . . . , n} by [n]. 

Definition 2 We denote by ^ the set of n x n real matrices and by S its 
subset of singular matrices. Also, for a subset S C [n]^ we denote 

^s = {Ag ^ \ if {i,j) S then aij = 0}. 

We denote by ^s the space of random n x n matrices obtained by setting 
aij = if {i,j) G S and drawing all other entries independently from the 
standard Gaussian A^(0, 1). As above, if S" = [n]^, we write simply ^. 

In the rest of this paper, for non-singular matrices A, A' , we denote their 
inverses by F, F', respectively. Also, we denote by Ar,ij\ the sub-matrix of A 
obtained by removing from A its ith. row and its jth column. 

3 Determinant Computation 

We consider here the problem of computing the determinant of a matrix A 
and its componentwise condition number Cdet(^)- The main result of this 
section is the following. 



Theorem 1 For S C [n]^ and t > 2|S'| we have 

Prob{cdet(A) > t} < \S\^^. 

Average loss of precision (in a base b) is measured by the expected value 
of the logarithm (in that base) of the condition number. We may use The- 
orem 1 to obtain one such result for the computation of the determinant. 
To avoid problems caused by this condition number being less than 1 we 
consider the function log_,_(x) defined to be log(x) if a; > 1 and otherwise. 

Corollary 1 For a base b > 1, E(log_^ Cdet(^)) < SloglS*] + j^ where E 
denotes expectation over A G ^5. 

Towards the proof of Theorem 1 we first obtain explicit expressions for 
Cdet(^)- We begin by noting that taking F : ^ ^ H to he F{A) = det{A) 
in Definition 1 we obtain, for A £ ^ \ S, 

,^, ,. |det(^')-det(A)| 

Cdet(^) = lim sup — — — . 

^ ' ^^^A'^B^Afi) <5|det(^)| 

Also, for A G S, we have Cdet(^) = if 

|det(y4')l 
lim sup = 

^-^'^ A' aB(A,f,) ^ 

and Cdet(^) = 00 otherwise. Note that Ccjet(O) = 0. 
Lemma 1 For A £ .J^ \T,, 

Cdet(-4) = ^ |aij7ji|. 

i,j£[n] 

For AeTj, Cdet(^) = if 

Y^ \aij det{A^ij))\ = 

i,jG[n] 

and CdetiA) = cxd otherwise. 

Proof. Let A G ^. For any i G [n], expanding by the ith row, 

det{A)= j;(-ir+%,, det(^(,,)). 



Hence, for all i,j £ [n], 



ddetjA) 



det(^(y)). 



Using Taylor's expansion and these equalities we obtain 

det(y4') = det(^) + ^ (a^^- - a^-) det(A(ij-)) + O [\\A' - Af) . 

i,j£[n] 

Here, the norm in ||^' — ^|| is not relevant since all norms in ^ are equiv- 
alent. By choosing a monotonic norm we have that \i A' G B{A, 8) then 
\\A' - A\\ = 0{5). It follows that, for A E, 

CH (A) - hm SUP |det(^0-det(A)| 
"^-(^) - fe^,|,"(% 5|det(A)| 



lim sup > 



Koij -aij)det(A(ij))| 



5-o^,eB(A,5).^^^^ (5|det(A)| 



,. / >r- IKj - aii)det(^(y))| |a^ -a^l 

= hm sup > — — - — — — : — -, ; — < 5 \ . 

5-0 "^l.^, <5|det(^)| 

The supremum above is attained by taking a^ • = aij{l + 5) and therefore 

Cdet(^) = 



E 



aij det(ylj 



det(A) 



ije[n] 



If A G S it similarly follows that 

det(A') 



lim sup 



^ \aij det(^(y))| 

j,je[n] 



and hence the statement. 



D 



Lemma 2 Let p, q he two fixed vectors in H" such that 
X ~ N{0, Id„) then, for all t > 2, 



Prob 



x^p 



> t)- < 



1 



< Ikll. If 



Proof. Let v = \\q\\. By the orthogonal invariance of A^(0, Id„) we may 
assume q = (z^, 0, . . . , 0). Also, by appropriately scaling, we may assume 
that V = \. Note that then, ||p|| < 1. We therefore have 



Prob 



x^ -p 



X' 



> t 



Prob 



Prob 



^1+ E 

je{2,...,n} 



■p\ H aZ 



Xi 



> t 



> t 



(2) 



Prob <! — > — } + Prob <^ — < — 

xi a \ \xi a 



where Z = N{0, 1) independent of xi and a = \Jv2 + ■ ■ ■ + Pn ^ 1- Here we 
used that a sum of independent centered Gaussians is a centered Gaussian 
whose variance is the sum of the terms variances. Note that in case a = 
the statement is trivially true. 

Since the xi and Z are independent A^(0, 1), the angle 6 = arctan (Z/xi) 
is uniformly distributed in [—it/2, 7r/2] and we have, for 7 G H, 



Prob <! — > 7 

Xl 



1 /vr 
Prob{0 > arctan 7} = — f arctan 7 

I f^ I 1 f^ 1 1 

-Trdt < — / -^dt = — . 

^ vr ./^ t vr7 



vr ./^ 1 + t 



7 



Similarly, for a gTR, 



Prob < — < cTf = 1 — Prob {6* > arctan a} = 1 ( arctan a ] 

Ixi-J ^- ^ 7rV2 J 



1 /vr / n\ 1 
- - - arctan -0- < — 

TT \ 2 J vr — (T 



Using these bounds in (2) with 7 = — — and a = — we obtain 



Prob 



X p 



> t\ < 



a a 

+ 



a 2t 2 t 1 

< -^ < 



TT \t — Pl t + PlJ TT t'^ — pf Vrt^ — 1 t 



the last since t > 2. 



a 



Lemma 3 Let S CI [n]^ be such that ^s ^ S- Then, for all A G 
Cdet(^) = 0. 



Proof. Since ^s ^ S and A G ^s we have B{A,b) C S for all (5 > 0. 
The result now follows. □ 

Lemma 4 Let S C. [n]^ such tiiat ^5 ^ S. TLeii 

Prob(y4 is sineular) = 0. 

Proof. The set of singular matrices in ^5 is the zero set of the re- 
striction of the determinant to ^5. This restriction is a polynomial in H'"^' 
whose zero set, if different from TR'^', has dimension smaller than IS*!. 

D 

Proof of Theorem 1. Case (i): ^5 C S. In this case, the desired 
inequality is trivial by Lemma 3. 

Case (ii): ^s 2 ^- By Lemma 4, with probability 1, ^ is non-singular. 
So, by Lemma 1, 

Prob{cdet(^) > t} = Prob < 2_, l^^ijljil ^ * 

I j,je[n] 



Prob^ E 



ajjdet(^(jj)) 



det(^) 



>ty (3) 



Assume (1, 1) G S* and let x = ai be the first column of A. Also, let 
I = {i £ [n] \ {i, 1) G S"} and xj be the vector obtained by removing entries 
Xi with i ^ I. Then, 

X5~iV(0,Id|,|). (4) 

Let q = (det A(ii), . . . ,det A(„i)) and qi be the vector obtained by re- 
moving entries qi with i ^ I. Clearly, qj is independent of xj. Using this 
notation, the expansion by the first column yields 

det{A) = E(-l)^+ia,idet(^(a)) = xjqi. 

In addition, aiidet(^(ii)) = xj{qiei) where ei is the vector with the first 
entry equal to 1 and all others equal to 0. Hence, 

aiidet(A(n)) _ xjfaei) 
det(^) ~ x]qi 



Using (4) and Lemma 2 (with p = {qiei) and q = qj) we obtain, for z > 2, 



Prob 



aiidet(^(ii)) 



det(^) 



> z)- < 



z 



The same bound can be proven for all {i,j) G S. Using these bounds with 
z = T^ and (3) we obtain, 



Prob{cdet(A) >t}<Yl ^^°^ 



ttij det{A(^ij)] 



det{A) 



- \s\i - t 



a 



The proof of Corollary 1 follows from the following result by taking 
Z = CdetU) and to = |5'p. 

Proposition 2 Let Iq > and Z > 1 be a random variable satisfying that 
Prob{Z >t}< tot-^ for all t>to. Then E(log Z) < log to + nTfe where b> 1 
is the base of the logarithm. 

Proof. We have ProbjlogZ > t} < to&"* = &-(*-i°s*n) for all t > logto- 
Therefore, 

/•oo poo 1 

E(logZ)=/ Prob{logZ>s}ds<logto+/ 6"(*"'°s*")cit = logto+^ 
Jo J login In 6 



4 Matrix inversion 

We now focus on the problem of inverting a matrix A and its componentwise 
condition number c'{A). Our main results in this section are the following. 

Theorem 2 Let 5 C [n]^ be such that Jis 2 S. Then, for all t > 2\S\, 

Prob{ct(^) > t} < A\S\'^n'^- 

where Prob denotes probability over A £ ^s- 

Corollary 2 Let S* C {nf be such that J^s 2 S. Then, 

E(log , (ct(.4))) < 21ogn + 21og \S\ + log4 + -^ 

mo 



where E denotes expectation over A £ ^s- 



u 



Remark 1 Note that, for all monotonic norm on ^5, the bound above also 
holds for m{A) by Proposition 1. This is in contrast with the lower bound 
linear in n for the expected value of the logarithm of normwise condition 
numbers, etc. 

Definition 1 yields expressions for the (mixed and componentwise) con- 
dition numbers of a matrix A by taking D = ^ \ S and F : ^ \ S ^ ^ 
given by F[^A) = A'^'^. For /c,^ G [n] such that '^m / 0, we let 



4(A) = lim sup \jk^ 



and for k,i ^ [n] such that 7^^ = 0, we let c|,^(A) = if 



lim sup 

^^'^ A'eB{A,5) 

and ciJA) = 00 otherwise. Then 



■key 
h'ke - 7fed 



c^A) = max 4^ (A). 



k,ee[n 
Similarly, for a norm || || on ^, 



m{A) = lim sup 



Lemma 5 For A e ^ \T, and k,£ e [n], 

Proof. We divide the proof by cases. Case (i): jki / 0. 

Let 5 > he sufficiently small so that B{A,6) n S = and, for all 

A'eB{A,6), '^"'^'^2t~{T^^^ < 1. Let A' eB{A,6). 
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Since jke 



dct(A(^fc)) 
dct{A) ■ 



'Ike - ^kl 



det(^) /det(A^ ^ det(^(^fc)) ' 
det(A(^fc)) I det(A') det(A) 

det(A) det(^^(^fe)) ^ 
det(A(^fc)) det(^') 



1 + 



det(A((.j.)) 



1 + 



dct(A')-dct(A) 



1 



dct(A) 

'^^^^(-^(te))-'^^^^(-^(<ifc)) _ det(A')-det(A) 
det{A(tt)) det(A) 



1 + 



det(AO-det(A) 



Using that 



dct(A')-dct(A) 
det(A) 



dct(A) 
<1, 



7fc£ - 7fe^ 



IM 



< 



det(^(,fc))-det(A(tt)) 
det(A(tt)) 


+ 


dct(A')-dct(A) 
dct(A) 


1- 


dct(A' 
d 


)-dct(A) 
.t(A) 





and therefore 



sup 

A'eB(A,5) 



Iki - Iki 



SUPA'eB(A,5) 



< 



'J'^t(A'(^fc))-det(A(^fc)) 
<5det(A(ffc)) 



+ S'^VA'eB{A,5) 



det(A')-det(A) 
(5dct(A) 



1 - SUp^/gg(^^^) 

Taking hmits for 5 ^ on both sides we get 
Case (ii): 7^^ = and 



dct(A')-dct(A) 
dct{A) 



Um sup 



Wm\ 



^-^^ A' eB{A,5) ^ 
In this case, cr.AA) = and the statement holds. 



Case (iii): 7^^ = and 



/ hm sup 



Wm\ 



Um sup 



det(^^ 



tkl 



^^^A'&B{A,5) 5-.0^,ge(^^5) (5|det(^')l 
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In this case 



lim sup ^—^^ / 



and therefore Cdet(^^fc) = oo. The statement holds as weh. □ 

Proof of Theorem 2. By definition of c^A), 

Prob{c^(^) >t} = Prob i max 4<?(^) > t\ < ^ Prob{4^(A) > t}. 

'^'''^^'"^ J k,ee[n] 

By Lemma 4, with probabihty 1, A is non-singular. So, we can apply 
Lemma 5 to obtain 

Prob{4,^(^) >t} < Prob |cdet(^) > ^1 + Prob Ldet{A^M)) > ^1 
the last inequality by Theorem 1. The statement now follows. □ 



5 Linear Equations Solving 

We finally deal with the problem of solving linear systems of equations. 
Theorem 3 Let S C [n]^ be such that Jis 2 S. Then, for aU t > 2(|S'|+n), 

Prob{c(A,6) >t}< lOlS'pn- 
where Prob denotes probabihty over {A,b) G ^s x -^(0, Id„). 
Corollary 3 Let S* C [n]^ be such that J^s 2 S. Then, 

E(log+(c(^,6))) <logn + 21og|5| + loglO + — . □ 

Definition 1 yields again expressions for the (mixed and componentwise) 
condition numbers of a pair i^A, b) by taking D = (^ \ S) x IR" and F : 
(^ \ S) X M" ^ M" given by F{A, b) = A^^b. 

For A £ ^ \T, and b G IR" we denote x = A~^b. For k G [n] such that 
Xfc 7^ we let 

Cfc(^,o) = lim sup 
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For k £ [n] such that Xfc = we let Ck{A, 6) = if 

hm sup = 

^^0{A',b')£Bi{A,b),S) 

and Cfc(A, &) = cx) otherwise. Then 

c{A, h) = max c^ {A,h). 

ke[n] 

Similarly, for a norm || || in M", 



m{A, h) = lim sup 



\x' — x\ 



'5^0(A',b')eS({A,fe),5) o\\x\\ 

In what follows let R^ be the matrix obtained by replacing the fcth 
column of A by h. 

Lemma 6 For any non-singular matrix A and k £ [n], 

Ck{A, h) < Cdet(^) + Cdet(-Rfe)- 

Proof. By Crammer's rule, 

_ det(i?fc) 
""' " det(A) • 

The rest of this proof is similar to the proof of Lemma 5. □ 

Proof of Theorem 3. It follows the hues of that of Theorem 2. First, 
we get 

Prob{c(^,6) > t} < ^ Prob{cfc(^,6) > t}. 

feG[n] 

Then, we apply Lemma 6 (using that, with probability 1, ^ S) and the 
fact that \S\ > n to get 

Prob{cfc(A, h)>t] < Prob |cdet(^) ^ ^} + ^'°^ |cdet(i?fc) > I 

< 2\S\'^- + 2{\S\+nf-<W\S\^- 
from which the statement follows. □ 
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6 Additional Remarks 

(1) To obtain bounds for the average loss of precision of random tri- 
angular systems one may combine Theorem 3 with the following result by 
Wilkinson [9, Ch.3,§19] which we quote as given in [4]. 

Theorem 4 Let T G H"^" be a nonsingular triangular matrix, an assume 
nu < 0.1 (here u is the round-off unit). Then, the computed solution x to 
the system Tx = b satisfies 

{T + E)x = b, 

where, for some universal constant c, 

\eij\ < i\i - j\ +2)cu\tij\. □ 

The use of c{A) actually yields average loss of precision with the latter 
measured componentwise in the computed solution. 

(2) The bound in Corollary 2 appears to be worse than what computer 
simulations suggest E(log c^(L„)) should be. In [2] matrices L„ were gener- 
ated for various values of n and an experimental mean of E(log cT(L„)) was 
obtained from these values. A linear regression for these means shows a best 
fit of 3.065 logn— 1.1466. A probable source of (a good part of) the difference 
of this value with the estimate 6 logn + 0(1) following from Corollary 2 is 

the broad bound Prob |maxfc^^g[„] cl^{A) > t\ < Y.k,ie[n] ^'^^H^lei^) ^ 
in the proof of Theorem 2. In addition to this, the bound E(m^(L„)) < 
E(cT(L„)) following from Proposition 1 may be coarse as well. Numerical 
experiments in [2] suggest a best fit of E(logm^(L„)) ~ 1.53341og n— 0.5723. 

(3) Section 6 of [8] discusses stability of Gaussian elimination. Having 
shown that, almost surely, k(L„) ~ 2" the authors reflect on how this behav- 
ior can be reconciled with the fact that "Gaussian elimination is overwhelm- 
ingly stable." They point out that "The reason appears to be statistical: 
the matrices A for which ||L^^|| is large occupy an exponentially small pro- 
portion of the space of matrices" a claim for which experimental evidence is 
given in [6]. It would thus follow that "i/ie matrices L produced by Gaussian 
elimination are far from random.''^ 

Our results show that, in addition, Gaussian elimination needs much 
less than producing matrices L in the vanishingly small set of triangular 



14 



matrices with k{L) small. It is enough to produce matrices L outside the 
vanishingly small set of matrices with c{L, b) large. 

Aknowledgement. We are grateful to Ernesto Mordecki for helpful dis- 
cussions. 
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